1/1 


flO-USS  711 
UNCLASSIFIED 


INEXPERT  CALIBRATION  OF  CORPREHENSION(U>  NISCONSIN 
CENTER  FOR  EDUCATION  RESEARCH  NADISON 
A  H  GLEN8ER0  ET  AL.  Bl  HAR  86  HCER-86-2 
N88S14-8S-K-B644  F/G  S/IB 


Unclassified _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE 


iOi  IK  ^0 1 


REPORT  DOCUMENTATION  PAGE  | 

la.  REPORT  SECURITY  CLASSIFICATION 

Unclassified 

1b  RESTRICTIVE  MARKINGS 

2a  SECURITY  CLASSIFICATION  AUTHORITY 

3  DISTRIBUTION /AVAILABILITY  OF  REPORT 

Approved  for  public  release; 
distribution  unlimited 

2b  DECLASSIFICATION  /  DOWNGRADING  SCHEDULE 

4  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

WCER  Program  Report  86-3 

S.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

6a  NAME  OF  PERFORMING  ORGANIZATION 
Wisconsin  Center  for 

Education  Research 

6b  OFFICE  SYMBOL 
(If  applicable) 

7a  NAME  OF  MONITORING  ORGANIZATION 

Personnel  and  Training  Research  Programs 
Office  of  Naval  Research  (Code  1142PT) 

6t.  ADDRESS  (City.  State,  and  ZIP  Code) 

1025  W.  Johnson  Street 

Madison,  WI  53706 

7b  ADDRESS  (Ofy,  State,  and  ZIP  Code) 

800  North  Quincy  Street 

Arlington,  VA  22217-5000 

8a.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 

Bb.  OFFICE  SYMBOL 
(If  applicable) 

9  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

N0014-85-K-0644 

8c.  ADDRESS  (City,  State,  and  ZIPCode)  I 

10  SOURCE  OF  FUNDING  NUMBERS 

PROGRAM 

PROJECT 

TASK 

ELEMENT  NO 

NO 

NO 

61153N 

RR04206 

OC 

WORK  UNIT 
ACCESSION  NO 

NR702-012 


11  TITLE  (Include  Security  Classificetion) 

Inexpert  Calibration  of  Comprehension  (Unclassified) 


12  PERSONAL  AUTHOR(S) 

Glenberg,  Arthur  M.,  &  Epstein,  William 


13a  TYPE  OF  REPORT 

Technical  Report 


16  supplementary  NOTATION 


|l3b  TIME  COVERED 

14.  DATE  OF  REPORT  (Year,  Month,  Day) 

1  ■!  in  ■ 

_ 86-3-1 

Under  review 


FIELD 

GROUP 

05 

10 

COSATI  CODES 


SUB-GROUP 


18  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 
•comprehension,  meta-comprehension,  expert  knowledge 


19  ABSTRACT  (Continue  on  reverse  if  rtecessary  and  identify  by  block  number) 

Students  with  a  wide  range  of  coursework  in  physics  or  music  theory  read  expositions 
in  both  domains.  After  reading,  for  each  text  students  provided  a  judgment  of  confidence  in 
ability  to  verify  inferences  based  on  the  central  principle  of  the  text.  The  primary  de¬ 
pendent  variable  was  calibration  of  comprehension,  the  degree  of  association  between  confi¬ 
dence  and  performance  in  the  inference  test.  Two  results  of  most  interest  were  (a)  exper¬ 
tise  in  a  domain  was  inversely  related  to  calibration  and  (•fei,  subjects  were  well-calibrated 
across  domains.  Both  of  these  results  can  be  accommodated  by  a  self-classification  strategy; 
Confidence  judgments  are  based  on  self-classification  as  expert  or  non-expert  in  the  domain 
of  the  text,  rather  than  an  assessment  of  the  degree  to  which  the  text  was  comprehended. 
Because  self-classifications  are  not  well  differentiated  within  a  domain,  application  of 
the  strategy  by  experts  produces  poor  calibration  within  a  domain.  Nonetheless,  because 
self-classification  is  generally  consistent  with  performance  across  domains,  application 
of  the  strategy  produces  calibration  across  domains.  ' 


20  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
□  UNCLASSIFIED/UNLIMITED  □  SAME  AS  RPT 
:2a  NAME  OF  RESPONSIBLE  INDIVIDUAL 

Dr.  Michael  Shafto 

DO  FORM  1473,  84  MAR  83APRe 


ACT  21  ABSTRACT  SECURITY  CLASSIFICATION 

AS  RPT  □  OTIC  USERS  Unclassified 

22b  TELEPHONE  (Include  Area  Code)  22c  OFFICE  SYMBOL 

202-696-4596  ONR  1142PT 

83  APR  edition  may  be  used  until  exhausted  SECURITY  CLASSIFICATION  OF  THIS  PAGE 

All  other  edition,  are  obsolete  Unclassified 


e"- 


Program  Report  86-3 
Inexpert  Calibration  of  Comprehension 


Arthur  M.  Glenberg  and  William  Epstein 
University  of  Wisconsin 


Wisconsin  Center  for  Education  Research 
School  of  Education 
University  of  W Isconsln-Mad Ison 
March  1,  1986 


A/cl. 

CckiL 


Accesion  For 

_  .  1 

NTIS 

CRA&I 

Q 

DTIC 

TAB 

□ 

Unannoui  ced 

□ 

Justification 

By  .. 

Disfc  ibutior.  / 

Availability  Codes 

Dist 

A-/ 

Avan  a  d /or  1 

Spe 

cial 

This  work  was  supported  by  National  Institute  of  Education  Grant  NIE-g-84-0008 


and  by  Personnel  and  Training  Research  Programs,  Psychological  Sciences 
Division,  Office  of  Naval  Research,  under  Contract  No.  N(fol4-8^-K-OO^r  Contract 


Authority  Identification  Number,  NR  702-012.  Approved  for  public  release; 
distribution  unlimited.  Reproduction  In  whole  or  part  Is  permitted  for  any 
purpose  of  the  United  States  Government. 


WISCONSIN  CENTER  FOR  EDUCATION  RESEARCH 


MISSION  STATBIENT 


The  mission  of  the  Wisconsin  Center  for  Education  Research  Is  to  Improve 
the  quality  of  American  Education  for  all  students.  Our  goal  Is  that 
future  generations  achieve  the  degree  of  knowledge •  tolerance, 
sensitivity,  and  complex  thinking  skills  necessary  to  ensure  a  productive 
and  enlightened  democratic  society.  We  are  willing  to  explore  solutions 
to  major  problems,  recognizing  that  radical  change  may  be  necessary  to 
meet  our  goal. 

Our  approach  is  Interdisciplinary  because  the  problems  of  education  In 
the  United  States  go  far  beyond  pedagogy.  We  therefore  draw  on  the 
knowledge  of  scholars  in  psychology,  sociology,  history,  economics, 
philosophy,  and  law  as  well  as  experts  In  teacher  education,  curriculum, 
and  administration  In  order  to  arrive  at  a  deeper  understanding  of 
schooling. 

Work  of  the  Center  clusters  In  four  broad  areas: 

.  Learning  and  Development  focuses  on  Individuals,  In  particular 
on  their  variability  In  basic  learning  and  development  processes. 

.  Classroom  Processes  seeks  to  adapt  psychological  constructs  to 
the  Improvement  of  classroom  learning  and  Instruction. 

.  School  Processes  focuses  on  schoolwide  issues  and  variables, 
seeking  to  Identify  administrative  and  organizational  practices 
that  are  particularly  effective. 

.  Social  Policy  Is  directed  toward  delineating  the  conditions 
under  which  social  policy  Is  likely  to  succeed,  the  ends  to 
which  It  Is  suited,  and  the  constraints  which  It  faces. 

The  Wisconsin  Center  for  Education  Research  is  a  nonlnstructlonal  unit 
of  the  University  of  Wlsconsln-Madlson  School  of  Education.  The  Center 
Is  supported  primarily  with  funds  from  the  Office  of  Educational  Research 
and  Improvement /Department  of  Education,  the  National  Science  Foundation, 
and  other  governmental  and  non-govemmental  sources  In  the  U.S. 


Abstract 


Students  with  a  wide  range  of  eoursework  in  physics  or  music  theory  read 
expositions  in  both  domains.  After  reading,  for  each  text  students  provided  a 
Judgment  of  confidence  in  ability  to  verify  inferences  based  on  the  central 
principle  of  the  text.  The  primary  dependent  variable  was  calibration  of 
comprehension,  the  degree  of  associati(H)  between  confidence  and  performance  on 
the  inference  teat.  Two  results  of  moat  Interest  were  (a)  expertise  in  a  domain 
was  inversely  related  to  calibration  and  (b)  subjects  were  well-calibrated 
across  domains.  Both  of  these  results  can  be  accommodated  by  a 
self-classification  strategy:  Confidence  Judgments  are  based  on 
self-classification  as  expert  or  non-e:q>ert  in  the  domain  of  the  text,  rather 
than  an  assessment  of  the  degree  to  which  the  text  was  comprehended.  Because 
self-classifications  are  not  well  differentiated  within  a  domedn,  application  of 
the  strategy  by  experts  produces  poor  calibration  within  a  domain.  Nonetheless, 
because  self- classification  is  generally  consistent  with  performance  across 
domains,  application  of  the  strategy  produces  calibration  across  domains. 


3 


A  reader's  self-assessment  of  comprehension  often  has  significant 
consequences  for  the  reader's  action.  When  reading  under  time  constraints,  the 
reader’s  belief  that  comprehension  has  been  achieved  will  encourage  the  reader 
to  terminate  further  processing  of  the  text.  When  reading  in  preparation  for 
testing,  the  belief  that  comprehension  has  been  attained  will  lead  the  reader  to 
declare  his  readiness  for  testing.  Given  these  and  other  implications  for 
action,  it  is  sensible  to  inquire  whether  readers'  beliefs  are  regularly  valid. 
Taking  as  our  measure,  the  relationship  between  the  readers'  self-assessments  of 
confidence  in  comprehension  (strength  of  belief)  and  performance  on  a  test  of 
comprehension,  we  have  repeatedly  found  that  readers’  beliefs  typically  are  off 
the  mark.  Readers  are  very  poorly  calibrated ;  confidence  in  comprehension 
(belief)  does  not  predict  performance. 

Glenberg  and  Epstein  (1985)  measured  calibration  by  having  subjects  read  15 
short  expositions  on  a  variety  of  topics.  Subjects  also  provided  an  assessment 
of  their  confidence  in  ability  to  use  a  principle  from  the  text  (provided  at  the 
time  of  the  confidence  assessment)  to  Judge  whether  or  not  em  inference  was 
correct.  Finally,  subjects  attempted  to  decide  if  an  inference  using  the 
principle  was  or  vas  not  valid.  One  measure  of  calibration  of  comprehension  is 
the  point  biserial  correlation  between  the  confidence  assessments  and 
performance  on  the  inference  teat.  In  none  of  three  experiments  reported  by 
Glenberg  and  Epstein  was  this  correlation  significantly  different  from  zero. 

In  subsequent  unpublished  experiments  deploying  a  variety  of  performance 
measures  and  a  diverse  set  of  measures  of  calibration,  the  finding  of  zero  or 
marginal  calibration  heis  recurred.  This  result  is  disconcerting  because  it 
appears  to  identify  an  important  obstacle  in  learning  from  text.  The  result 


al3o  does  not  conform  to  our  personal  experience.  In  our  experience  In  learning 
from  text,  calibration  of  comprehension  seems  reasonably  good. 

Upon  more  detailed  scrutiny  of  our  experience,  our  Initial  Impression  that. 
In  general,  we  were  calibrated  had  to  be  qualified.  Our  Impression  may  have 
been  much  affected  by  the  availability  heuristic.  In  assessing  the  degree  of 
calibration  that  we  exhibited  we  relied  heavily  on  the  most  readily  available 
Instances,  and  as  a  matter  of  course,  these  were  instances  Involving  texts  In 
our  personal  domains  of  expertise.  By  contrast.  In  our  experiments,  the  texts 
were  by  design  a  varied  set  that  probably  touched  only  peripherally  on  readers' 
special  fields  of  competence.  These  considerations  led  to  the  current 
experiment  to  test  the  relationship  between  calibration  and  expertise. 

Everyday  observation  suggests  that  experts  may  be  well-calibrated.  These 
observations  are  probably  confounded  with  the  domain  of  reading,  however.  That 
Is,  the  expert  knows  that  he  Is  competent  In  the  domain  of  expertise  and  that  he 
is  less  competent  in  other  domains.  Thus  by  using  base  rates  the  expert  can 
accurately  predict  better  performance  In  the  domain  of  expertise  than  In 
alternative  domains.  Nonetheless,  this  ability  to  predict  relative  performance 
across  domains  does  not  Imply  that  the  expert  Is  well  calibrated  within  a 
domain. 

In  fact,  a  sampling  of  the  literature  Indicates  that  relative  expertise 
does  not  confer  an  ability  to  predict  performance  within  the  domain.  Oskamp 
(1965)  has  reported  that  trained  clinical  psychologists  are  greatly 
overconfident  in  their  predictions  derived  from  reading  case  studies. 

Similarly,  Hock  (1985)  found  that  students  In  a  master's  In  business 
administration  program  were  overconfident  in  their  predictions  of  their  future 
success  In  developing  employment  opportunities.  Bradley  (1981)  had 


undergraduates  rank  their  knowledge  In  twelve  donalns.  He  then  administered  a 
short  test  on  content  from  each  domain  and  had  subjects  rate  confidence  In  each 
answer.  Performance  on  the  test  was  positively  related  to  the  knowledge 
r^mklngs.  However,  confidence  In  Incorrect  answers  also  Increased  with  the 
knowlege  ranking.  The  "experts"  were  less  likely  (or  willing)  to  admit 
ignorance . 

We  recruited  subjects  who  had  a  minimum  of  two  college-level  physics 
courses  or  two  college-level  music  courses  (excluding  performance  courses  such 
as  marching  band).  Within  each  of  these  groups  subjects  had  a  wide  ramge  of 
formal  coursework  and  non-academic  experience.  We  choose  these  two  domains 
because,  the  knowledge  acquired  within  the  domains  have  little  overlap.  Also, 
Blrkmlre  (1982)  has  found  that  music  students  reading  In  the  domain  of  music 
were  more  sensitive  to  structurally  important  components  of  the  text  thaui  when 
reading  in  the  domain  of  physics.  Physics  students  showed  the  converse  effect. 

Our  stimulus  materials  were  prepared  by  two  graduate  students:  a  graduate 
student  In  physics  composed  16  expositions  on  various  topics  in  physics;  a 
graduate  student  in  music  theory  composed  16  expositions  on  various  topics  in 
music.  Each  of  the  subjects  read  all  of  these  texts,  eight  physics  texts  and 
eight  music  texts  on  each  of  two  days.  At  the  end  of  each  day's  session,  the 
subject  rated  confidence  in  ability  to  correctly  answer  Inferences  for  each  text 
and  was  given  the  Inference  verification  test.  (Glenberg  and  Epstein  (1985) 
demonstrated  that  delaying  the  confidence  assessment  and  the  test  until  the  end 
of  a  session  does  not  change  calibration.) 

The  expertise  hypothesis  predicts  that  physics  students  will  be  better 
calibrated  for  the  physics  texts  than  for  the  music  texts,  and  that  music 
students  will  show  the  opposite  pattern.  On  the  other  hand,  expertise  may  only 


6 


confer  the  ability  to  predict  better  performance  in  the  domain  of  expertise  than 
In  an  alternative  domain.  In  this  c€ise,  (a)  experts  will  be  poorly  calibrated 
in  both  domains,  but  (b)  calibration  computed  across  domains  will  be  greater 
th2ui  zero. 

The  experiment  was  also  designed  to  assess  a  number  of  other  questions. 
First,  Glenberg  and  Epstein  (1985)  found  that,  although  the  average  measure  of 
calibration  was  not  significantly  different  from  zero,  there  was  large  variation 
in  the  point  blserlal  correlations.  Having  subjects  read  texts  on  two  days 
allowed  us  to  determine  if  this  variability  is  due  to  random  error  or  stable 


individual  differences. 

In  addition  to  obtaining  information  from  subjects  regarding  their 
experiences  in  the  domains  of  physics  and  music,  each  subject  was  assessed  on 
the  dualism  scale  (Ryan,  1984).  A  dualist  has  relatively  immature 
epsitemologlcal  standards,  believing  that  truth  is  absolute  in  most  if  not  all 
domains.  A  relativist  believes  that  truth  is  determined  by  the  context,  that 
propositions  are  true  or  false  within  a  particular  frame  of  reference.  Ryan 
demonstrated  that  relativists  engage  in  more  sophisticated  comprehension 
monitoring  than  do  dualists.  Thus  if  there  are  stable  individual  differences  in 
calibration  of  comprehension,  the  tendency  toward  dualism  may  well  predict  those 
differences. 

The  experiment  was  also  designed  to  test  the  generality  of  two  other 
findings  reported  by  Glenberg  and  Epstein  (1985).  In  their  third  experiment, 
subjects  provided  three  responses  after  answering  the  inference  question  for 
each  text.  First,  the  subject  was  asked  to  rate  confidence  in  the  correctness 
of  the  answer  to  the  inference  question.  The  correlation  of  this  confidence 
rating  and  performance  on  the  test  is  called  calibration  of  performance.  In 


A  L'-  .V 


w** 


contrast  to  initial  calibration,  calibration  of  performance  was  significantly 
greater  than  zero.  This  finding  is  consonant  with  Lichtenstein,  Flschhoff,  and 
Phillips's  (1982)  results  that  accuracy  of  postdictions  are  significantly  better 
than  chance  (although  generally  exhibiting  overconfidence). 

After  rating  confidence  in  performai.ce,  subjects  in  Glenberg  and  Epstein's 
thlixl  experiment  provided  another  assessment  of  confidence  in  ability  to  Judge 
Inferences  on  an  upcoming  test.  Then  a  second  Inference  test  was  given.  The 
correlation  between  this  second  prediction  and  performance  on  the  second  test  is 
called  recallbratlon.  In  Glenberg  and  Epstein's  third  experiment,  recalibration 
was  significantly  greater  than  zero.  Glenberg  and  Epstein  proposed  that  the 
experience  gained  from  answering  the  first  inference  question  (e.g. ,  ease  of 
retrieval  of  relevant  propositions,  amount  of  time  required  to  cheek  the 
inference)  provided  valid  cues  to  the  degree  of  comprehension,  and  that  these 
cues  could  be  used  to  predict  future  j)erf ormance .  A  similar  hypothesis  has 
been  offered  to  explain  the  relationship  between  accuracy  and  confidence  in 
eye-witness  identification.  Kassln  (1985)  found  that  subjects  in  the 
eye-witness  identification  task  are  generally  poorly  calibrated.  Having 
subjects  attend  to  the  experience  of  making  a  Judgement  results  in  significant 
Improvements  in  calibration. 

The  current  experiment  includes  the  measurements  needed  to  compute  both 
calibration  of  performance  and  recallbratlon.  Either  of  these  measures  may  be 
related  to  expertise  in  a  domain  of  knowledge. 

Method 

Subjects 

A  total  of  70  subjects  was  recruited  from  the  University  of 


Wisconsin-Madlson  community.  A  variety  of  recruitment  procedures  were  used 


Including  posters  advertising  the  experiment,  mailings  to  students  meeting  the 
minimum  coursework  requirements,  and  solicitation  in  upper-level  classes.  The 
minimum  coursework  requirement  was  completion  of  two  university-level  courses  in 
either  physics  or  music  theory.  Upon  completing  the  experiment,  subjects 
completed  a  questionnaire  requiring  a  listing  of  the  university-level  music  auid 
physics  courses  completed,  as  well  as  listing  other  experiences  either  in  music 
(e.g.,  lessons  on  an  instrument)  or  physics  (working  as  a  laboratory  assistant). 
These  experiences  were  coded  using  a  scale  of  0  (no  experience)  to  3  (experience 
at  a  professional  level  such  as  giving  music  lessons).  Descriptive  statistics 
are  given  in  Table  1. 


Insert  Table  1  about  here 


Since  there  were  subjects  who  had  relevant  experience  in  both  music  and 
physics,  we  did  not  attempt  to  classify  subjects  into  mutually  exclusive 
categories.  Instead,  background  knowledge  was  coded  using  four  variables, 
number  of  music  courses,  music  experience,  number  of  physics  courses,  and 
physics  experence.  These  four  variables  were  then  entered,  as  a  set,  into  a 
hierarchical  multiple  regression  analysis  to  determine  the  effect  of  background 
knowledge  on  calibration. 

The  questionnaire  also  contained  a  seven-lton  scale  for  measuring  dualism 
(Ryan,  1984).  Subjects  rated  the  relative  frequency  (1=  rarely,  5=  almost 
always)  of  experiencing  thoughts  such  as  "If  professors  would  stick  more  to  the 
facts  and  do  less  theorizing  one  could  get  more  out  of  college."  The  higher  the 
average  rating,  the  greater  the  tendency  toward  dualism.  Data  from  this  scale 


are  also  given  in  Table  1 . 


9 


Subjects  were  paid  $8.00  for  participating  in  the  experiment. 

Materials 

Each  text  was  one  paragraph  long  and  was  written  to  Illustrate  or  explicate 
a  central  principle  that  was  stated  explicitly  in  the  text.  An  example  is 
presented  in  the  appendix  with  the  central  principle  highlighted.  The  principle 
was  not  highlighted  for  the  subjects.  Two  pairs  of  inference  questions  were 
written  for  each  text.  Each  of  these  questions  stated  an  inference  that  the 
subject  was  to  Judge  as  true  or  false.  One  member  of  each  pair  was  a  true 
inference,  the  other  member  of  each  pair  was  a  false  Inference.  Acburate 
performance  on  the  inference  tests  required  knowledge  of  the  central  principle. 
Examples  of  the  Inference  tests  are  provided  in  the  appendix. 

The  texts  were  arranged  in  two  booklets  with  16  texts  in  each.  One  booklet 
was  used  for  the  first  session,  and  one  booklet  was  used  for  the  second. 

Within  each  booklet  there  were  eigjit  music  texts  alternating  with  eight 
physics  texts.  The  order  of  the  texts  was  counterbalanced  over  subjects. 

Following  the  texts  in  each  booklet  were  16  sets  of  five  probes.  Each 
set  corresponded  to  one  of  the  texts,  and  the  sets  were  in  the  same  order  as 
the  texts.  The  confidence  probe  (probe  1}  gave  the  title  of  the  text  Euid 
required  the  subject  to  indicate  confidence  in  ability  to  Judge  the 

correctness  of  an  inference  regarding  -  .  The  blank  was  filled  with  a 

reference  to  the  central  principle  (see  the  appendix  for  examples).  Subjects 
responded  by  circling  a  confidence  rating  of  1  (very  low)  to  6  (very  high). 

The  inference  test  (probe  2)  was  on  the  following  page  (headed  by  the  title 
of  the  relevant  text).  Subjects  Judged  the  correctness  of  the  Inference  by 
circling  a  T  (true)  or  F  (false).  The  confidence  in  performance  scale  (probe  3) 
was  on  the  same  page.  Subjects  were  asked  to  rate  their  confidence  that  they 


. 

-'a 


,N*  ■ 


,V.' 


A' 

,v 


had  answered  the  inference  test  correctly  (using  a  number  from  1  to  6).  The 
recalibration  confidence  scale  (probe  4)  was  also  on  this  page.  Subjects 
indicated  confidence  in  ability  to  answer  another  inference  regarding  the 
central  principle.  Once  again,  confidence  was  indicated  by  circling  a  number 
from  1  to  6. 

The  following  page  presented  the  second  inference  test  (the  fifth  probe). 
This  page  was  also  headed  by  the  title  of  the  text.  Again,  subjects  responded 
by  circling  T  or  F. 

Procedure 

Subjects  were  tested  in  small  groups.  The  instructions  explained  that  the 
aim  of  the  experiment  was  to  investigate  how  students  assess  comprehension. 

They  were  told  that  they  could  read  the  passages  at  their  own  pace,  and 
re-reading  of  a  passage  was  allowed.  However,  once  any  page  was  turned,  it 
could  not  be  turned  back.  Further  instruction  regarding  how  to  answer  the  five 
probes  was  also  provided. 

On  the  first  day,  the  experiment  was  adjourned  after  subjects  had  read  and 
completed  the  16  sets  of  probes.  The  second  session  was  scheduled  for  1  to  7 
days  later.  At  the  end  of  the  second  session  the  subjects  completed  two 
questionnaires.  For  the  first,  subjects  were  asked  to  rate  the  familieurity  of 
each  of  the  32  texts  on  a  scale  of  1  to  6.  Subjects  were  provided  with  copies 
of  the  texts  while  producing  the  ratings.  The  second  questionnaire 
was  the  survey  on  domain-specific  experiences  and  dualism. 

Results 

The  basic  strategy  of  data  analysis  was  to  use  hlereu'chical  multiple 
regression  techniques  to  perform  an  analysis  of  variance  (Cohen  &  Cohen, 

1977).  Two  groups  of  emalyses  were  performed.  In  the  initial  £Uialyses  the 


between-subjects  variables  were  dualism  entered  Into  the  regression  first, 
followed  by  the  four  background  knowledge  veu?lables  entered  as  a  set  with  four 
degrees  of  freedom.  The  protected-^  procedure  was  used;  the  significance  of 
Individual  components  of  the  background  knowledge  set  were  only  examined  when 
the  omnibus  F  was  significant.  The  wlthln-subjects  variables  were  type  of 
text  (music  or  physics)  and  the  Interaction  of  type  of  text  and  background 
knowledge.  The  protected-^  procedure  was  also  used  to  examine  components  of 
this  Interaction.  The  Interaction  of  dualism  and  type  of  text  was  not 
examined.  The  MSB  terms  were  computed  by  dividing  the  proportion  of 
(between-subject  or  within- subject)  variance  not  accounted  for  by  any  of  the 
Independent  variables  by  the  degrees  of  freedom. 

The  second  set  of  analyses  was  motivated  by  two  concerns.  First,  the 
dualism  variable  accounted  for  little  variance  and  thus  tended  to  waste 
degrees  of  freedom.  Second,  there  were  significant  positive  correlations 
between  music  experience  and  music  courses  variables  (.62)  and  between  physics 
experience  and  physics  courses  (.47).  These  correlations  can  distort  the 
significance  levels  of  the  the  Individual  variables  when  they  are  entered  as  a 
set  (the  problem  of  colllnearlty,  Cohen  &  Cohen,  1975).  For  these  reasons, 
the  second  set  of  analyses  omitted  the  dualism,  music  experience,  physics 
experience  variables.  Fortunately,  the  second  set  of  anaylses  produced  a  very 
similar  pattern  of  significant  results  eis  the  first  set  of  analyses.  Because 
the  second  analyses  are  simpler,  they  will  be  the  main  focus  of  the  results 
section.  Reference  to  the  first  analyses  will  only  be  made  when  there  Is  a 
significant  discrepancy  between  the  two. 

The  measurement  of  calibration  requires  variability  In  both  the  use  of 
the  confidence  scale  and  In  performance  on  the  Inference  test.  Because  some 


12 


subjects  used  the  same  confidence  Judgement  or  answered  all  of  the  inference 
questions  correctly,  they  were  excluded  from  some  of  the  analyses. 

Consequently,  the  number  of  subjects  contributing  to  each  analysis  differed. 
This  number  is  indicated  at  the  beginning  of  each  of  the  sections  dealing  with 
separate  analyses. 

Initial  calibration  and  its  components 

Confidence  (probe  1),  n  =  61.  The  mean  confidence  on  the  music  texts 
(with  standard  deviation  in  parentheses)  was  4.69  (>99),  and  the  mean 
confidence  on  the  physics  texts  was  4.73  (.94).  These  means  were  not 
significantly  different.  There  was  one  significant  effect  in  the  analysis  of 
variance,  type  of  text  interacted  with  background  knowledge,  F(4,  116)  =  79.34, 
MSB  =  .0024.  Both  of  the  background  knowledge  variables,  number  of  music 
courses  and  number  of  physics  courses,  were  significant  contributors  to  this 
interaction. 


Insert  Table  2  about  here 


The  regression  coefficients  are  given  in  Table  2.  These  coefficients 
indicate  the  average  change  in  the  dependent  variable  (in  this  case, 
confidence)  for  each  unit  change  in  the  independent  variable. 

The  coefficients  in  Table  2  indicate  a  reasonable  pattern  of  relationships 


between  the  independent  variables  and  confidence.  Confidence  in  music  texts 
increases  with  the  number  of  music  courses,  and  the  increase  for  music  texts  is 
significantly  greater  than  the  increase  for  the  physics  texts.  Also,  confidence 
in  physics  texts  increases  with  number  of  physics  courses,  and  that  increase  is 


significantly  greater  for  the  physics  texts  than  for  the  music  texts. 


13 


k-. 


y- 


*•-  ^  JH  kr*  -  F  •  » 


%r  HJ-'  V 


These  results  provide  a  manipulation  check  on  the  construction  euid 
classification  of  the  texts,  and  the  validity  of  the  the  background  knowledge 
variables.  That  is,  the  Interaction  between  text  type  and  confidence  is  just 
what  would  be  expected  If  our  subjects  did  Indeed  differ  in  expertise  In  the  two 
fields,  and  the  texts  tapped  that  difference. 

Proportion  correct  on  the  first  inference  test  (probe  2),  n  =  61.  Mean 
proportion  correct  was  .72  (.12)  on  the  music  texts  and  .79  (.12)  on  the  physics 
texts,  a  significant  difference,  F(4,  116)  =  38.39,  MSB  =  .0021.  The  set  of 
background  knowledge  variables  also  accounted  for  a  significant  part  of  the 
variance,  F(2,  58)  =  8.48,  MSB  =  .0133*  Only  the  physics  courses  variable  was 
significant  by  the  protected-^  procedure.  Bach  additional  physics  course  was 
associated  with  a  .0217  increase  in  proportion  correct  (averaged  over  both  types 
of  text). 

In  the  first  analyses  of  proportion  correct,  a  significant  main  effect 
was  found  for  dualism,  F(1,  55)  =  4.54,  MSB  =  .0129.  Bach  unit  increment  on 
the  dualism  scale  \tas  associated  with  a  .0268  reduction  in  proportion  correct. 

There  was  also  a  significant  interaction  between  type  of  text  and 
background  knowledge,  F(2,116)  =  19.42,  MSB  =  .0021.  The  regression  coefflcents 
for  this  interaction  are  given  in  Table  2.  The  major  component  carrying  the 
interaction  was  number  of  music  courses.  Proportion  correct  on  the  music  texts 
increased  with  increases  in  music  courses,  whereas  proportion  correct  on  the 
physics  texts  was  essentially  unrelated  to  music  courses.  The  opposite  pattern 
was  found  for  the  physics  courses  variable  (although  not  significant): 

Proportion  correct  on  the  physics  tests  increased  more  with  physics  experience 
than  did  proportion  correct  on  the  music  texts.  The  failure  to  reach 
significance  may  in  part  reflect  the  problem  of  collinearlty.  The  two  variables 
are  significantly,  although  negatively,  correlated  (-.44). 


.  V  V 

Vi 


Calibration  of  comprehension,  n  =  50.  Calibration  is  measured  by  the 
degree  of  association  between  confidence  and  performance  on  the  Inference  test. 
One  such  measure  is  the  point-blserlal  correlation.  Unfortunately,  this  measure 
has  a  number  of  undesireable  properties,  including  that  the  maximum  value 
depends  on  the  proportion  correct.  Nelson  (1984)  suggests  that  the 
Goodman-Kruskal  gamma  (G)  is  the  most  appropriate  index  of  association  for 
measuring  metacognltlve  performance  under  the  conditions  instantiated  in  this 
experiment.  Gamma  ranges  from  -1  to  1,  with  0  indicating  no  relationship.  It 
has  a  direct  Interpretation  in  terms  of  the  difference  between  two 
probabilities.  Consider  all  pairs  of  texts  that  for  a  given  subject,  differ  on 
both  confidence  and  performwce  on  the  inference  test.  Gamma  is  the  difference 
between  the  probability  that  the  text  with  the  greater  confidence  has  the  better 
performance  and  the  probability  that  the  text  with  the  greater  confidence  has 
the  lower  performance. 

For  each  subject,  G  was  computed  separately  for  the  music  texts  and  for 
the  physics  texts.  The  means  were  .06  (.53)  for  the  music  texts  and  .02  (.62) 
for  the  physics  texts.  Neither  of  these  means  was  significantly  different  from 
zero,  nor  were  they  different  from  one  another.  Although  none  of  the  main 
effects  were  significant,  there  was  a  significant  Interaction  between  type  of 
text  and  bacicground  knowledge,  F(2,  94)  s  7.99,  MSB  =  .0044.  The  regression 
coefficients  for  this  interaction  are  given  in  Table  2.  The  significant 
component  of  the  interaction  was  the  interaction  of  text  type  €uid  number  of 
physics  courses.  An  Increase  in  number  of  physics  courses  tended  to  decrease 
G  for  the  physics  texts,  but  had  essentially  no  relationship  to  G  for  the 
music  texts. 


The  finding  of  no  overall  calibration  of  comprehension  replicates  our 
previous  results  (Glenberg  &  Epstein,  1985).  The  new  information  provided  by 


this  experiment  concerns  the  relationship  between  level  of  knowledge  in  a  domain 
and  calibration  in  that  domain.  Under  these  experimental  conditions  that 
relationship  is  negative.  Note  that  for  the  physics  texts,  subjects  with  no 
physics  courses  and  the  average  number  of  music  courses  (2.76)  are  predicted  by 
the  regression  equation  to  be  fairly  well  calibrated,  G  =  .3152.  However,  the 
predicted  G  drops  to  .0170  for  subjects  with  the  average  number  of  both  music 
and  physics  courses.  This  new  result  is  discussed  further  in  Discussion 
section. 

Calibration  of  Performance 


Insert  Table  3  about  here 


Confidence  in  performance  (probe  3).  n  s  61,  After  answering  an 
inference  question,  subjects  rated  confidence  in  his  or  her  answer  to  the 
Inference  question.  The  mean  confidence  ratings  were  4.76  (.73)  and  4.99 
(.67)  for  the  music  and  physics  texts,  respectively.  These  means  were 
significantly  different,  F(1,  116)  =  12.22,  MSB  =  .0021.  There  was  also  a 
significant  interaction  between  type  of  text  and  background  knowledge, 

F(2,  116)  =  59.59,  MSB  =  .0021.  Each  of  the  background  knowledge  variables 
contributed  to  this  interaction,  ^s  >  3.65. 

The  regression  coefficients  are  given  in  Table  3.  Note  that  the  pattern  of 
the  coefficients  differs  for  confidence  (probe  1,  Table  2)  and  confidence  in 
performance  (probe  3,  Table  3).  That  la,  for  both  variables,  the  difference 
between  the  coefficients  for  music  texts  auid  physics  texts  is  smaller  in  Table  3 
than  in  Table  2.  We  will  use  this  difference  to  argue  (in  the  Discussion 
section)  that  subjects  used  different  strategies  to  produce  the  two  confidence 
ratings . 


Is  there  a 


significant  relationship  (G)  between  confidence  in  perforaance  and  actual 
performance?  In  short,  the  answer  is  yes.  The  average  performance  G  for  the 
music  texts  was  .42  (.43)  euid  the  average  for  the  physics  texts  was  .36  (.55). 
Both  of  these  Gs  are  significantly  greater  than  zero,  and  they  are  sizeable  on 
an  absolute  scale.  Remember  that  G  is  a  difference  in  probabilities:  An  average 
G  of  .39  means  that  for  texts  that  differ  in  confidence  and  whether  or  not  they 

are  correct  on  the  inference  teat,  the  probability  that  the  text  with  the 

greater  confidence  is  correct  is  .39  greater  than  the  probability  that  the  text 
with  the  lower  confidence  is  correct. 

Performance  G  was  unrelated  to  number  of  music  courses  and  unrelated  to 
number  of  physics  courses,  also,  the  baclcground  knowledge  variables  did  not 
Interact  with  type  of  text.  Thus  to  the  extent  that  the  null  hypothesis  is 
supported,  calibration  of  performance  is  unrelated  to  expertise. 

The  significant  performance  G  is  Important  in  two  respects.  First,  it 
replicates  our  previous  finding  (Glenberg  &  Epstein,  1935),  and  creates  a 

bridge  between  our  work  on  calibration  of  comprehension  and  other  work  on 

calibration  of  probabilities.  The  ability  to  accurately  postdlct  performance 
has  been  a  stable  feature  of  the  calibration  literature  (Lichtenstein  et  al., 
1982). 

Second,  the  significant  perforaance  G  helps  to  rule  out  sorae  uninteresting 
interpretations  of  the  non-significant  calibration  of  comprehension  In 
particular,  given  that  performance  G  is  significant,  it  is  less  likely  that 
the  non- sign  if cant  calibration  of  comprehension  G  reflects  low  statistical 
power,  or  any  hidden  constraints  in  our  procedures. 


Recallbratlon  and  Its  Components 


Insert  Table  4  about  here 


Recallbratlon  oonfldenoe  (probe  4),  n  =  61 «  After  assessing  confldencie  In 
performance,  subjects  were  asked  for  confidence  in  ability  to  answer  a  second 
Inference  test  related  to  the  same  principle.  Recallbratlon  confidence  is 
markedly  similar  to  calibration  confidence  (probe  1).  The  recallbratlon 
confidence  means  were  4.67  (.87)  and  4.72  (.88)  for  the  music  and  physics  texts 
respectively.  The  only  significant  effect  was  the  interaction  of  text  type  and 
background  knowledge,  F(4,  116)  =77.14,  MSB  =  .0022.  The  regression 
coefficients  are  given  in  Table  4.  Note  that  for  both  variables,  the  difference 
between  the  coefficients  for  the  music  and  {^yslc  texts  is  almost  as  great  for 
recalibration  confidence  as  for  calibration  confidence  (Table  2). 

Recallbratlon  proportion  correct  (probe  5),  n  =  61.  Performance  on  the 
second  inference  test  was  similar  to  performemce  on  the  first.  The  mean 
proportions  correct  were  .73  (.13)  and  .79  (.12)  for  the  music  and  physics 
texts,  respectively.  The  difference  was  significant,  F(1,  116)  =  21.48, 

MSB  =  .0030. 

There  was  also  a  significant  Interaction  between  type  of  text  and 
background  knowledge,  F(2,  116)  =  10.61,  MSB  =  .0030.  The  regression 
coefficients  are  listed  in  Table  4.  The  only  significant  component  in  the 
interaction  Involves  the  number  of  physics  courses  variable.  Increments  in 
number  of  physics  courses  are  associated  with  increments  in  proportion  correct 
for  the  physics  texts,  but  not  for  the  music  texts  (this  effect  was  not 
significant  in  the  first  analysis  using  four  variables  to  code  background 
knowledge). 


As  In  the  analysis  of  the  first  Inference  test,  there  was  a  m2d.n  effect  for 


dualism,  F(1,  55)  =  8.15,  MSB  =  .0135,  in  the  first  set  of  analyses.  On  the 
average,  a  unit  Increase  in  the  dualism  variable  was  associated  with  a  decrease 
of  .0365  in  proportion  correct. 

Recallbratlon  G,  n  =  54.  Recalibration  Gs  were  .06  (.53)  and  .02  (.62) 
for  the  music  and  physics  texts  respectively.  Neither  was  significantly 
different  from  zero.  Background  knowledge  did  account  for  a  significant 
proportion  of  the  variance  in  recalibration  G,  F(2,  51)  =  *1.49,  MSB  =  .0167. 
Number  of  music  courses  was  the  variable  that  contributed  most. 

There  was  also  a  significant  interaction  between  type  of  text  and 
background  knowledge,  F(2,  102)  =  6.12,  MSB  =  .0032,  that  was  carried  by  the 
physics  courses  varlalile.  The  regression  coeflcients  for  this  interaction  are 
in  Table  4.  As  with  initial  calibration,  increments  in  physics  courses  had  a 
greater  detrimental  effect  on  recalibration  for  the  physics  texts  than  for  the 
music  texts. 

The  recallbratlon  data  do  not  replicate  the  effect  reported  by  Glenberg 
and  Bpstein  (1985).  They  found  that  recalibration  was  significantly  greater 
than  Initial  calibration  (based  on  probes  1  and  2).  Here,  overall 
recalibration  is  not  different  from  zero,  and  any  effect  of  expertise  is  to 
decrease  recallbratlon,  much  as  it  decreases  initial  calibration.  This  failure 
to  replicate  is  addressed  in  the  discussion. 

Stability  of  Calibration  Over  Days,  n  =  6l 

Two  new  calibration  Gs  were  computed  for  each  subject,  one  for  day  1  and 
one  for  day  2  of  the  experiment.  Bach  of  these  Gs  was  based  on  probes  1 
(initial  confidence)  and  2  (initial  Inference  evaluation)  for  16  texts,  8  music 
texts  and  8  physics  texts.  All  previously  reported  Gs  were  computed  separately 
for  different  types  of  texts. 


19 


The  across-text>type  Gs  were  .18  (.5^)  and  .30  (.45)  for  day  1  and  day  2, 
respectively.  Both  of  these  Gs  are  significantly  greater  than  zero,  _t3  =  2.60 
and  5.21,  respectively. 

The  correlation  between  across-text-type  G  for  day  1  and  across-text-type  G 
for  day  2  was  only  -.03.  This  may  be  compared  with  the  correlation  between 
confidence  (probe  1)  on  day  1  and  day  2,  .84,  and  the  correlation  between 
proportion  correct  on  the  two  days,  .37.  This  failure  to  find  stable  individual 
differences  suggests  that  the  search  for  variables  (e.g. ,  dualism)  that  would 
correlate  with  calibration  is  futile. 

These  data  present  somewhat  of  a  mystery.  Why  should  G  computed  by 
collapsing  across  type  of  text  be  significantly  greater  than  zero,  when 
calibration  (based  on  the  same  number  of  texts)  computed  within  a  type  of  text 
is  essentially  zero?  One  rather  uninteresting  explanation  is  that  G  based 
on  a  single  type  of  text  suffers  from  a  restricted  range;  combining  across 
text  types  pools  texts  that  have  a  greater  range  on  both  the  confidence  scale 
and  proportion  correct  resulting  in  a  larger  G. 

Two  arguments  can  be  made  against  this  explanation.  First,  G,  unlike 
the  product-moment  correlations  requires  only  ordinal  data.  In  fact,  the 
value  of  the  statistic  is  completely  unaffected  by  the  remge  of  confidence 
scores,  as  long  as  there  is  some  variability  so  that  the  statistic  can  be 
computed. 

Second,  recall  that  performance  Gs  were  significantly  greater  than  zero. 
These  performance  Gs  use  exactly  the  same  proportion  correct  data  as  the 
calibration  Gs  that  are  not  significantly  different  from  zero.  Clearly,  the 
poor  calibration  Gs  cannot  be  attributed  to  restricted  range  of  performance. 

A  second  e]q>l{uiatlon  for  the  significant  across-text-type  Gs  is 
provided  by  the  following  hypothesis.  We  suppose  that  subjects  can 


accurately  classify  themselves  as  relatively  more  expert  In  music  or  In 
physics.  We  also  suppose  that  self- classified  music  students  believe  that 
they  will  do  better  on  music  texts  than  on  physics  texts,  and  that  self- 
classlfed  physics  students  believe  the  opposite.  In  fact,  these  beliefs  are 
consonant  with  the  results  of  our  analyses  of  proportion  correct.  Finally,  we 
suppose  that  confidence  Is  based  on  these  beliefs.  Because  performance  Is 
better  In  texts  In  the  domain  consonant  with  the  self-classlflcatlon  than  In 
the  other  domain,  the  self-classlflcatlon  Is  Indeed  predictive  of  performemce 
so  that  across- text-type  G  is  greater  than  zero.  According  to  this  hypothesis, 
calibration  across  domains  simply  reflects  the  expert's  use  of  base  rates  to 
accurately  predict  differences  In  performance  across  domains. 

There  is  strong  evidence  consistent  with  the  self-classification 
hypothesis.  According  to  the  hypothesis,  subjects  use  their  experience  with 
music  or  physics  to  generate  a  confidence  assessment  for  each  text.  This 
experience  Is  public  data,  at  least  to  the  extent  it  is  revealed  on  the 
questionnaire  filled  out  at  the  end  of  the  experiment  (see  Method  section  and 
Table  1).  If  the  hypothesis  is  correct,  we  should  be  able  to  use  these  public 
data  to  generate  confidence  ratings  that  predict  performance  as  well  as  the 
confidence  ratings  actually  given  by  the  subjects. 

The  test  of  this  prediction  required  several  steps.  (A  total  of  U3 
subjects  contributed  to  all  steps.)  First,  a  calibration  G  was  computed  for 
each  subject  using  all  32  texts  (to  provide  a  maximally  sensitive  test).  The 
average  G  was  .20  (.35),  which  Is  significantly  greater  than  zero,  t  =  3»75. 
Next,  using  the  regression  coefficients  for  confidence  listed  in  Table  2,  we 
computed  for  each  subject  a  single  simulated  confidence  rating  for  music  texts 
and  a  single  simulated  confidence  rating  for  physics  texts.  Finally,  using 


these  simulated  confidence  ratings  a  simulated  G  was  computed  for  each 
subject. 

The  mean  simulated  G  was  .22  (.44).  This  G  was  significantly  greater  than 
zero,  t  ^  3 >28.  The  mean  simulated  G  and  the  mean  of  the  actual  Gs  (based  on  32 
texts)  were  not  signifcantly  different.  Importantly,  the  correlation  between 
the  simulated  Gs  based  on  public  data  and  the  Gs  based  on  the  subjects'  own  32 
confidence  ratings  was  .57. 

An  implication  of  the  self-classification  hypothesis  is  that  subjects  are 
not  using  any  sort  of  privileged  access  to  their  own  knowledge  to  generate 
confidence  assessments;  Indeed  the  hypothesis  Implies  that  subjects  eire  not 
assessing  comprehension  of  the  texts  when  they  provide  a  confidence  Judgement, 
Instead  they  are  simply  recording  a  belief  based  on  their  general  experience. 
Thus  the  significant  across-text-type  G  should  not  be  taken  as  evidence  of 
accurate  self-assessments  comprehension.  As  Just  demonstrated,  the  confidence 
scores  generated  by  the  regression  equation,  which  obviously  has  no  privileged 
access  to  subject's  degree  of  comprehension,  can  predict  performance  ets  well  as 
the  subject's  own  confidence  ratings. 

A  similar  explanation  can  be  applied  to  the  significant  correlation  between 
average  confidence  and  average  performance.  On  day  1,  the  correlation  was  .51, 
and  on  day  2  the  correlation  was  .37.  These  correlations  do  not  imply  that 
subjects  are  calibrated.  Some  subjects  know  that  they  generally  do  well  on 
tests  and  hence  have  high  confidence,  other  subjects  know  that  they  generally  do 
poorly  on  tests  and  hence  have  low  confidence.  To  the  extent  that  past 
experience  predicts  future  performance,  there  is  a  correlation  between  average 
confidence  and  performance.  However,  neither  the  subjects  who  generally  do  well 
nor  those  who  generally  do  poorly  can  accurately  assess  comprehension  and 


predict  which  Inference  tests  will  be  answered  correctly:  When  calibration  must 
be  based  on  actual  assessments  of  comprehension  (l.e.,  within  a  text  type) 
calibration  Is  zero. 

Discussion 

This  experiment  was  designed  to  answer  four  questions.  The  first  question 
was  whether  calibration  of  comprehension  for  texts  In  a  given  domain  changes 
with  expertise  In  that  domain.  The  answer  Is  yes,  but  perhaps  in  an  unexpected 
way.  The  regression  analyses  for  both  calibration  and  recallbratlon  Indicate 
that  G  decreases  with  experience  in  a  domain  (and  slffilflcantly  so  for 
physics). 

The  second  question  was  whether  there  are  stable  individual  differences  In 
calibration  of  comprehension.  Here  the  answer  Is  no.  Even  the  significant 
across-text-type  G  was  not  stable  across  days. 

The  third  question  was  whether  accurate  calibration  of  performance  would  be 
found.  For  this  question  the  answer  is  yes.  Cedibration  of  performance  was  not 
only  statistically  significant,  It  was  quite  large,  .42  for  the  music  texts  and 
.36  for  the  physics  texts  (recall  that  G  is  the  difference  between  two 
probabilities).  Apparently,  subjects  can  fairly  accurately  Judge  the  quality  of 
their  performance  on  an  inference  verification  test. 

The  fourth  question  concerned  recalibration.  Previous  results  Indicated 
that  subjects  could  take  advantage  of  experience  gained  while  answering  an 
Inference  test  to  predict  performeuice  on  future  tests  over  the  same  material. 

The  subjects  participating  in  this  experiment  did  not  exhibit  this  ability. 
Self-Glasslfloatlon  Hypothesis 


The  pattern  of  the  results  discussed  so  far,  as  well  as  other  data.  Is 
consistent  with  the  self-classification  hypothesis.  The  hypothesis  Is  that 


subjects  classified  themselves  as  relatively  expert  in  music  or  physics,  and 
used  the  belief  that  expertise  in  a  domain  is  correlated  with  comprehension  of 
texts  in  that  domain  to  generate  confidence  ratings.  That  is, 
self-classification  rather  than  assessment  of  text  comprehension  controlled  the 
confidence  ratings. 

The  strongest  evidence  consistent  with  the  hypothesis  is  from  the  analysis 
of  the  simulated  Gs.  The  mean  simulated  G  was  not  significantly  different  from 
the  mean  G  produced  by  the  subjects,  and  the  correlation  between  the  simulated 
Gs  and  the  actual  across-text-type  Gs  was  substantial. 

The  self-classification  hypothesis  provides  a  simple  explanation  for  the 
poor  calibration  within  a  text  type.  According  to  the  hypothesis,  subjects  are 
not  actually  assessing  comprehension,  Instead  they  ewe  responding  on  the  basis 
of  beliefs  about  their  abilities  within  a  given  domain.  These  beliefs  are  not 
sufficiently  fine-grained  (differentiated)  to  accurately  predict  performance 
within  a  domain. 

Vsu*iabiJity  of  confidence  ratings  within  a  domain  may  be  based  on  Judged 
familiarity  with  a  topic.  In  fact,  the  average  correlation  between  familiarity 
ratings  (obtained  at  the  end  of  the  second  session)  and  confidence  was  .63 
(.17).  When  these  familiarity  ratings  (one  for  each  text)  are  used  to  compute  a 
G,  the  average  familiarity  G,  .23  (.29),  is  not  significantly  different  from 
the  average  simulated  G  based  on  a  single  confidence  rating  for  each  type  of 
text.  Thus,  although  the  familiarity  ratings  account  for  varibility  in  the 
confidence  ratings,  they  do  not  contain  any  useful  information  for  predicting 
performaince  over  and  above  that  provided  by  the  self-classifications. 

The  self-classification  hypothesis  is  also  at  least  partially  consistent 
with  the  negative  relationship  between  expertise  and  calibration  (within  a 


domain).  Most  likely,  only  subjects  who  regard  themselves  as  having  some 
expertise  will  apply  the  self-clsisslflcatlon  strategy.  Other  subjects  may 
actually  carry  out  some  form  of  evaluation  of  comprehension  that  predicts 
performance  on  the  Inference  test  (based  on  the  regression  equations,  subjects 
with  an  average  number  of  music  courses,  but  no  physics  courses,  were 
calibrated).  Thus  Increasing  expertise  Is  associated  with  application  of  a  less 
successful  strategy  for  predicting  performance  within  a  domain. 

The  self-classification  strategy  was  probably  also  applied  when  subjects 
were  asked  to  re-assess  confidence  (probe  4)  In  future  performwce.  The  pattern 
of  regression  coefficients  relating  background  knowledge  to  Initial  confidence 
(probe  1)  was  similar  to  the  pattern  relating  background  knowledge  to 
re-assessed  confidence  (probe  4,  compare  Tables  2  and  4).  Apparently  subjects 
were  using  the  same  information  (self-classifications)  to  make  both  ratings. 

On  the  other  hand,  It  appears  that  confidence  in  performance  (probe  3)  was 
not  determined  by  self-classification.  First,  these  confidence  ratings  were 
significantly  correlated  with  actual  performeince  (performance  G  greater  than 
zero)  within  a  domain  of  knowledge,  which  is  not  possible  by  application  of  the 
self-classification  strategy  alone.  Second,  the  pattern  of  regression 
coefficients  relating  background  knowledge  to  confidence  In  performance  Is  quite 
different  from  the  pattern  relating  background  knowledge  to  Initial  confidence 
(compare  coefficients  in  Table  3  to  those  in  Table  2). 

When  Is  the  Self-olasslfloatlon  Strategy  Applied? 

We  have  stressed  the  contribution  that  self-cleisslflcatlon  may  make  to  the 


computation  of  confidence.  But  we  do  not  Intend  to  Imply  that  the  metacognltlve 
rule  expressing  the  relationship  between  self-classification  and  likelihood  of 
successful  performance  Is  the  only  rule  for  computing  confidence.  Other  rules 


based  on  famllleu'lty  and  ease  or  completeness  of  access  to  the  relevant  text  may 
also  be  engaged.  In  fact,  earlier  we  reported  a  significant  correlation  between 
familiarity  ratings  and  confidence  ratings. 

Given  that  there  is  a  repertoire  of  metacognltlve  rules  for  computing 
confidence,  when  is  the  self«classification  strategy  applied?  One  consideration 
may  be  the  task  setting.  Various  aspects  of  the  setting  of  the  current 
experiment  probably  encouraged  use  of  the  strategy.  Subjects  knew  that  they 
were  selected  on  the  basis  of  their  experience  in  music  and  physics  courses.  In 
addition,  the  texts  were  clearly  in  one  domain  or  the  other,  and  the  contrast 
was  heightened  by  the  presentation  order  which  alternated  texts  from  the  two 
domains.  Probably,  the  strategy  la  encouraged  whenever  the  domain  of  the  text 
clearly  matches  the  subject's  cwn  beliefs  about  domains  of  expertise. 

In  addition  to  the  task  setting,  it  is  plausible  to  postulate  that  other 
factors  affecting  availability  of  rules  in  memory  are  Involved  in  determining 
the  subject's  choice  from  the  repertoire  of  metacognltlve  rules.  Also,  it  seems 
likely  that  the  process  of  selection  is  dynamic  reflecting  the  effects  of 
several  variables  operating  concurrently  to  sisslgn  prominence  to  different 
metacognltlve  rules.  The  dynamic  character  of  the  process  helps  us  to  formulate 
a  coherent  account  of  the  principal  findings  of  this  study. 

We  have  argued  that  the  initial  confidence  rating  was  computed  by 
application  of  the  self-classification  strategy,  the  rule  made  most  available  by 
the  task  setting.  Why  then,  was  the  self-classification  strategy  not  applied 
when  rating  confidence  in  performance?  After  answering  the  first  inference  test 
(probe  2),  subjects  could  base  their  confidence  rating  on  either  the 
self-classification  strategy,  or  the  specific  experience  gained  from  answering 
the  inference  (such  as  ease  of  retrieving  relevant  propositions  from  memory). 


We  propose  that  most  subjects  chose  to  use  specific  experience  for  the  following 
reasons,  (a)  Having  just  evaluated  the  inference  (probe  2),  the  experience  was 
probably  highly  available  while  making  the  confidence  in  performance  rating 
(probe  3).  (b)  Some  of  the  specific  experiences  were  probably  eetslly  recognized 
as  diagnostic.  For  ex6unple,  failure  to  retrieve  any  information  relevant  to 
evaluating  the  Inference  is  easily  recognized  as  a  useful  predictor  of  chance 
performance,  (c)  The  experience  was  specific  to  the  particular  Judgement  being 
made,  whereas  the  self-classification  strategy  is  more  general.  Thus  after 
answering  the  first  Inference  other  metacognitive  rules  (e.g. ,  base  confidence 
on  experience,  perhaps  latency,  2mswering  the  question)  are  at  leeist  as 
available  as  the  self-classification  strategy. 

On  the  other  hand,  it  appears  that  the  self-classification  strategy  was 
applied  again  in  generating  predictions  about  future  performance  on  the 
recalibration  confidence  rating  (probe  see  discussion  of  recalibration).  Why 
do  subjects  revert  to  using  the  self-classification  strategy  for  probe  after 
rejecting  it  for  probe  3?  In  euiswerlng  probe  4,  subjects  also  have  a  choice  of 
metacognitive  rules.  We  suspect  that  the  self-classification  strategy  is  chosen 
because  of  a  difference  in  the  diagnostic  value  attributed  by  the  subject  to  the 
experience  gained  from  answering  the  Initial  inference.  Experience  answering 
the  first  inference  is  believed  to  be  diagnostic  for  Judging  performance  on  the 
first  inference.  The  experience  is  believed  to  have  less  diagnostic  value  for 
predicting  future  performance.  Given  the  belief  that  the  diagnostic  value  of  the 
experience  is  low  and  the  ready  availability  of  a  strategy  with  high  face 
validity,  subjects  chose  the  self-classification  strategy. 

Use  of  the  self-classification  strategy  when  answering  probe  4  helps  to 
explain  why  significant  recallbratlon  was  not  found  in  this  experiment,  but  was 


found  in  Glenberg  and  Epstein  (1985) •  As  discussed  before,  the 
self-classification  strategy  cannot  produce  calibration  within  a  domain, 
obviating  any  possibility  of  significant  recalibration.  In  Glenberg  amd  Epstein 
(1985)  the  texts  were  sampled  from  a  variety  of  domains,  reducing  availability 
and  use  of  the  self-classification  strategy.  Thus  in  our  previous  research, 
when  subjects  re-assessed  confidence  after  the  initial  inference  test,  it  is 
likely  that  the  subjects  were  forced  to  use  a  metacognitive  role  with  greater 
predictive  validity  than  the  self-classification  strategy. 

In  summary,  it  appears  that  the  self-classification  strategy  will  be  used 
(and  be  effective)  under  the  following  conditions.  First,  the  structure  of  the 
calibration  task  suggests  the  strategy  by  hl^llghtlng  the  relationship  between 
a  reader's  doffl£d.n  of  knowledge  and  the  domain  of  the  text.  Second,  the  reader 
does  not  have  available  information  that  is  believed  to  be  more  specific  or  more 
diagnostic  than  self-classification.  Whether  or  not  application  of  the  strategy 
produces  calibration  depends  at  least  in  part  on  the  structure  of  the  task. 
Application  of  the  strategy  across  domains  of  expertise  is  almost  guaranteed  to 
produce  high  calibration.  Unfortunately,  the  self-classification  strategy  alone 
cannot  produce  calibration  within  a  domain  of  expertise. 


References 


Blrkmire,  D.P.  Effect  of  the  interaction  of  text  structure,  baoteround 

knowledge,  and  purpose  on  attention  to  text.  (Technical  aefflorandiua  6-82) 
U.S.  Army  Human  Engineering  Laboratory,  Aberdeen  Proving  Ground,  Harylwd. 

Bradley,  J.  V.  (1981).  Overconfidence  In  Ignorant  experts.  Bulletin  of  the 
Psychonomic  Society,  17,  82-8i|. 

Glenberg,  A.  N.,  &  Epstein,  W.  (1985).  Calibration  of  Comprehension.  Journal 
of  Experimental  Psychology;  Learning.  Memory,  and  Cognition,  1 1 .  702-718. 

Hock,  S.J.  (1985).  Count erf actual  reasoning  and  accuracy  in  predicting 

personal  events .  Journal  of  Experimental  Psychology;  Learning.  Memory,  and 
Cognition.  V[,  719-731. 

Kassin,  S.M.  (1985).  Eyewitnesss  identification:  Retrospective  self-awareness 
and  the  accuracy-confidence  correlation.  Journal  of  Personality  and  Social 
Psychology.  1>9.  878-893. 

Lichtenstein,  S.,  Flschhoff,  B.,  &  Phillips,  L.  D.  (1982).  Calibration  of 
probabilities:  The  state  of  the  art  to  I98O.  In  D.  Kahneman,  P.  Slovlc,  & 
A.  Tversky  (Eds.)  Judgement  under  certainty:  Heuristics  and  biases.  New 
York:  Cambridge  University  Press. 

Nelson,  T.O.  (1984).  A  compeu'lson  of  current  measures  of  the  accuracy  of 
feeling  of  knowing  predictions.  Psychological  Bulletin,  95,  109-133* 

Oskamp,  S.  (1965).  Overconfidence  in  case  study  Judgements.  Journal  of 
Consulting  and  Clinical  Psychology.  29.  261-265. 


Ryan,  M.  P.  (1984).  Monitoring  text  comprehension:  Individual  differences  in 
epistemological  standards.  Journal  of  Educational  Psychology,  76.  248-258. 


29 


Author  Notes 

This  research  was  funded  by  Office  of  Naval  Research  Contract 
N0014-85-K-0644  and  National  Institute  of  Education  Grant  NIE-G081-0009  to  the 
Wisconsin  Center  for  Education  Research.  Any  opinions,  findings,  and 
conclusions  or  recommendations  expressed  in  this  publication  €U*e  those  of  the 
authors  auid  do  not  necessarily  reflect  the  views  of  the  National  Institute  of 
Education  or  the  Department  of  Education. 

Our  thanks  to  Craig  Morris,  Tom  Sanockl,  and  Naomi  Swanson  for  assisting  in 
execution  of  this  research. 

Requests  for  reprints  should  be  sent  to  either  author  at  the  Department  of 
Psychology,  W.  J.  Brogden  Psychology  Building,  University  of  Wisconsin,  Madison, 
Wisconsin  53706. 


*  O 


•a*  '/.V 


Independent  Variable 


Dependent 

variable 

Y- 

Intercpt. 

Music 

Courses 

Physics 

Courses 

Music  text  confidence 

^1.7^71 

0. 1003a 

-0.1300b 

Physics  text  confidence 

4.5301 

-0.0789a 

0.1601b 

Music  text  prop.  eor. 

0.6453c 

o.oiaid 

•0.0159 

Physics  text  prop.  cor. 

0.7275c 

-0.0022d 

•0.0275 

Music  text  G 

0.1034 

-0.0251 

0.0120e 

Physics  text  G 

0.3740 

-0.0213 

-0.1l65e 

Note:  Asterisks  indicate  the  coefficients  of  variables  having  significant 
main  effects  (significantly  related  to  the  dependent  variable  averaged  over 
text  type).  Coefficients  with  the  same  letter  are  significantly  different 
from  one  another  and  Indicate  a  significant  interaction  between  the 


Independent  variable  and  text  type. 


Average  0 


0.M517 


-0.0081  -0.0154 


1 


1 


X 


Y 


Dependent 

variable 

Independent  Variable 

Y-  Music  Physics 

Intercpt .  Courses  Courses 

Music  text  confidence 

4.6512 

0.0944a 

-0.0961b 

Physics  text  confidence 

4.5287 

-0.0667a 

0.1421b 

Music  text  prop.  cor. 

0.7048c 

0.0060 

0.001 2d 

Physics  text  prop.  cor. 

0.7301c 

0.0000 

0.0224d 

Music  text  G 

-0.0596 

•0.0309 

0.0098e 

Physics  text  G 

0.1768 

•0.0277 

-0.09l8e 

Note:  Asterisks  Indloate  the  coefficients  of  variables  having  significant 
main  effects  (significantly  related  to  the  dependent  variable  averaged  over 
text  type).  Coefficients  with  the  same  letter  are  significantly  different 
froo  one  another  emd  indicate  a  significant  interaction  between  the 
independent  variable  and  text  type. 


I 


1 

t 

t 

i 

I 


V 

V 

I 

.s 

i 


r 


»*. 

m 


34 


Appendix 

Organic  Unity  -  Text 

The  way  in  which  the  parts  of  a  musical  work  relate  to  form  a  whole  has 
long  been  an  Important  consideration  of  musical  aesthetics.  The  theory  of 
organic  unity,  vrtilch  directly  compared  the  parts  and  whole  of  musical  works  to 
those  of  living  things,  became  part  of  the  evaluative  process  as  an  aesthetic 
norm  in  the  early  19th  century.  According  to  the  theory,  musical  pieces  were 
analogous  to  creatures:  Each  part  of  a  successful  work  was  essential.  Just  as 
every  part  of  the  body  was  (supposedly)  essential;  no  part  of  a  good  piece 
of  music  could  be  substituted  for  another,  since  each  had  a  specific  function  in 
the  unified  whole.  Furthermore,  as  in  an  organic  body,  the  combined  functions 
of  all  the  parts  of  a  musical  masterwork  were  believed  to  form  a  coherent  unity 
because  of  specific  relationships  which  held  the  parts  together;  thus  no  part  of 
the  whole  could  stand  separately  as  a  successful  work.  Certain  parts  of  the 
whole  were  believed  to  ceu*ry  more  Important  functions  than  others.  Just  as  the 
heart  has  a  more  Important  function  than  the  little  toe.  Furthermore,  it  was 
believed  that  great  composers  were  great  creators,  who,  like  God,  fashioned 
"living  organisms."  (Consider  a  statement  by  Karl  Rahlert,  music  aesthetlclan, 
writing  in  1848:  "What  is  musical  form  but  the  natural  body  that  music  must 
assume  in  order  to  establish  Itself  as  a  living  organism?”).  Though  the  analogy 
is  useful  and  interesting,  problems  with  the  theory  of  organic  unity  are 
evident.  It  assumed  that  composers  were  aiming  at  a  particular  kind  of 
structural  unity,  which  vas  simply  not  the  case  for  most  pieces  written  before 
about  1600  or  after  about  1910.  It  demonstrated  eui  evaluative  bias  against 
longer  forms,  especially  opera,  where  the  semblance  of  complete  unity  was  more 


difficult  to  maintain 


Circle  a  single  number  on  the  following  scale  to  report  your  confidence  in 


being  able  to  accurately  Judge  the  correctness  of  an  inference  drawn  from  the 
reading  about  the  relationships  between  parts  of  a  composition  according  to  the 
theory  of  organic  unity. 


1 _ 2 _ 3 


very 

low 


6 

I 

very 

high 


Probe  2  -  Initial  Inference 

Organic  Unity 

Inference:  According  to  the  theory  of  organic  unity,  it  is  not  possible  to 
Improve  some  compositions  by  deleting  specific  parts. 

T  F 


Phase  3  -  Confidence  in  Performance 

Organic  Unity 

Circle  a  single  number  on  the  following  scale  to  report  your  confidence 
that  you  have  answered  the  inference  correctly. 


Probe  4  -  Reoallbratlon  Confidence 


Circle  a  single  number  on  the  following  scale  to  report  your  confidence 
that  you  can  Judge  the  correctness  of  another  Inference  drawn  from  the  reading 
about  the  relationships  between  parts  of  a  composition  according  to  the  theory 
of  organic  unity. 


Probe  5  -  Second  Inference 

Organic  Unity 

Inference;  The  theory  of  organic  unity  does  not  explain  why  a  single 
movement  of  a  work  is  often  complete  auid  performable  without  the  other  movements 


of  the  composition 


1986/05/06 


Distribution  List  [U.  Wisconsin/Glenberg  4  Epstein]  NR  702-012 


Air  Force  Human  Resources  Lab 
AFHRL/MPD 

Brooks  AFB,  TX  78235 

Dr.  Robert  Ahlers 
Code  11711 

Human  Factors  Laboratory 
NAVTRAEQUIPCEN 
Orlando.  FL  32813 

Dr .  Fd  Aiken 

Navy  Personnel  RAD  Center 
San  Diego.  CA  92152 

Dr,  Earl  A.  Alluisi 
HQ.  AFHRL  (AFSC) 

Brooks  AFB.  TX  782:^5 

Dr.  John  R.  Anderson 
Department  of  Psychology 
Carncjgie-Mellon  University 
Pittsburgh,  PA  15213 

Technical  Director,  ART 
5901  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Alan  Baddeley 
Medical  Research  Council 
Applied  Psychology  Unit 
15  Chaucer  Road 
Cambridge  CB2  2EF 
ENGLAND 

Dr,  Patricia  Baggett 
University  of  Colorado 
Department  of  Psychology 
Box  ■’'IS 

Boulder.  CO  Pn-joo 

Dr.  Eva  L,  Baker 
UCLA  Center  for  the  Study 
of  Evaluation 
1U5  Moore  Hall 
University  of  California 
Los  Angeles,  CA  9002'l 

Dr.  r'eryl  S.  Baker 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152 


Dr.  John  Black 
Yale  University 
Box  11A,  Yale  Station 
New  Haven,  CT  06520 

Dr.  Jeff  Bonar 
Learning  RAD  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr .  Robert  Breaux 
Code  N-095R 
NAVTRAEQUIPCEN 
Orlando,  FL  :J2813 

Dr.  Ann  Brown 

Center  for  the  Study  of  Reading 
University  of  Illinois 
51  Gerty  Drive 
Champaign,  IL  61280 

Dr.  John  S.  Brown 
XEROX  Palo  Alto  Research 
Center 

3333  Coyote  Road 
Palo  Alto,  CA  9'1304 

Dr.  Patricia  A.  Butler 
NIE  Mail  Stop  1806 
1200  19th  St.,  NW 
Washington,  DC  20203 

Dr.  Robert  Calfee 
School  of  Education 
Stanford  University 
Stanford,  CA  99305 

Dr.  Susan  Carey 
Harvard  Graduate  School  of 
Education 

'‘■’7  Gutman  Library 
Appian  Way 
Cambridge,  MA  02138 

Dr.  Robert  Carroll 
NAVOP  niB7 

Washington,  DC  20370 

Dr.  Fred  Chang 

Navy  Personnel  R&D  Center 

Code  51 

San  Diego,  CA  92152 


Distribution  List  [U.  Wisconsin/Glenberg  &  Epstein]  NR  70P-012 


Dr.  Davida  Charney 
Department  of  Psychology 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  1521'’ 

Dr.  Michelene  Chi 
Learning  R  i  D  Center 
University  of  Pittsburgh 
;J070  O'Hara  Street 
Pittsburgh,  PA  15213 

Mr.  Raymond  E.  Christal 
AFHRL/MOE 

Brooks  AFB,  TX  75235 

Professor  Chu  Tien-Chen 
Mathematics  Department 
National  Taiwan  University 
Taipei,  TAIWAN 

Chief  of  Naval  Education 
and  Training 
Liaison  Office 

Air  Force  Human  Resource  Laboratory 
Operations  Training  Division 
Williams  AFB,  AZ  8522'( 

Dr.  Stanley  Collyer 
Office  of  Naval  Technology 
Code 

M,  Ouincy  Street 
Arlington,  VA  2221''-50no 

Dr.  Lee  Cronbach 
L  f  urnum  Road 
Atherton,  CA  O'»?05 

LT  Judy  Crookshanks 
Chief  of  Naval  Operations 
OP-1 1,?G5 

Washington,  DC  20370-2000 

CAPT  P.  Michael  Curran 
Office  of  Naval  Research 
ROo  N.  Ouincy  St. 

Code  125 

Arlington,  VA  22217-5000 


Dr.  Cary  Czichon 
Mail  Station  31107 
Texas  Instruments  AT  Lab 
P.O.  Box  <405 
Lewisville,  TX  75067 

Dr.  Natalie  Dehn 
Department  of  Computer  and 
Information  Science 
University  of  Oregon 
Eugene,  OR  97<103 

Dr.  Sharon  Derry 
Florida  State  University 
Department  of  Psychology 
Tallahassee,  FL  32306 

Defense  Technical 
Information  Center 
Cameron  Station,  Bldg  5 
Alexandria,  VA  22314 
Attn:  TC 
(12  Copies) 

Dr.  Thomas  M.  Duffy 
Communications  Design  Center 
Carnegle-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  1521? 

Dr.  Richard  Duran 
University  of  California 
Santa  Barbara,  CA  93106 

Dr.  John  Ellis 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92252 

Dr.  Richard  Elster 
Deputy  Assistant  Secretary 
of  the  Navy  (Manpower) 
OASN  (M&RA) 

Department  of  the  Navy 
Washington,  DC  20350-1000 

Dr.  Susan  Ehbretson 
University  of  Kansas 
Psychology  Department 
Lawrence,  KS  660145 


1986/02/06 


Distribution  List  [U.  Wisconsin/Glenberg  &  Epstein]  NR  702-012 


Dr.  William  Epstein 
University  of  Wisconsin 
W.  J.  Brogden  Psychology  Bldg. 
1202  W.  Johnson  Street 
Madison,  WI  5?706 

ERIC  Facility-Acquisitions 
'JR"’"’  Rugby  Avenue 
Bethesda,  MD  POOIU 

Dr.  Pat  Federico 
Code  SI’ 

NPRDC 

San  Diego,  CA  92152 

Dr.  Paul  Feltovich 
Southern  Illinois  University 
School  of  Medicine 
Medical  Education  Department 
P.O.  Box  1926 
Springfield,  IL  62708 

Mr.  Wallace  Feurzeig 
Educational  Technology 
Dolt  Beranek  &  Newman 
10  Moulton  St. 

Cambridge,  MA  02238 

Dr.  Baruch  Fischhoff 
Perceptronics,  Inc. 

6271  Variel  Avenue 
Woodland  Hills,  CA  91 367 

J.  D.  Fletcher 
9931  Corsica  Street 
Vienna  VA  22180 

Dr.  Linda  Flower 
Carnegie-Mellon  University 
Department  of  English 
Pittsburgh,  PA  ’5213 

Dr.  Carl  H.  Frederiksen 
McGill  University 
37nn  McTavish  Street 
Montreal ,  Quebec  H3A  1Y2 
CANADA 

Dr.  John  R.  Frederiksen 
Bolt  Beranek  &  Newman 
50  Moulton  Street 
Cambridge,  MA  02138 


Dr.  Alfred  R.  Fregly 
AFOSR/NL 

Bolling  AFB,  DC  20332 

Dr.  Robert  M.  Gagne 
1456  Mitchell  Avenue 
Tallahassee,  FL  32303 

Dr.  Dedre  Gentner 
University  of  Illinois 
Department  of  Psychology 
603  E.  Daniel  St. 

Champaign,  IL  61820 

Dr.  Herbert  Ginsburg 
University  of  Rochester 
Graduate  School  of 
Education 

Rochester,  NY  14627 

Dr.  Robert  Glaser 
Learning  Research 

It  Development  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  15260 

Dr.  Arthur  M.  Glenberg 
University  of  Wisconsin 
W.  J.  Brogden  Psychology  Bldg. 
1202  W.  Johnson  Street 
Madison,  WI  53706 

Dr.  Marvin  D.  Glock 
13  Stone  Hall 
Cornell  University 
Ithaca,  NY  14353 

Dr.  Sam  Glucksberg 
Princeton  University 
Department  of  Psychology 
Green  Hall 

Princeton,  NJ  08540 

Dr.  Susan  Goldman 
Univ~rsity  of  California 
Santa  Barbara,  CA  93106 

Dr.  Sherrie  Gott 

AFHRL/MODJ 

Brooks  AFB,  TX  78235 


*«*  N*  * 


1986/02/06 


Distribution  List  [U.  Wisconsin/Glenberg  &  Epstein]  NR  702-012 


Dr.  Richard  H.  Granger 
Department  of  Computer  Science 
University  of  California,  Irvine 
Irvine,  CA  9271? 

Dr.  Wayne  Gray 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  ?23'iZ 

Dr .  Bert  Green 
Johns  Hopkins  University 
Department  of  Psychology 
Charles  A  311th  Street 
Baltimore,  MD  21215 

Dr.  James  G.  Greeno 
University  of  California 
Berkeley,  CA  9'1720 

Dr.  Henry  M.  Halff 
Halff  Resources,  Inc. 
ii918  ?3rd  Road,  North 
Arlington,  VA  22207 

Dr.  Ray  Hannapel 
Scientific  and  Engineering 
Personnel  and  Education 
National  Science  Foundation 
l^ashington,  DC  20550 

Janice  Hart 
Office  of  the  Chief 
of  Naval  Operations 
OP-1 1HD 

Department  of  the  Navy 
Washington,  D.C.  20^50-2000 

Dr.  Reid  Hastie 
Northwestern  University 
Department  of  Psychology 
Evanston,  IL  60201 

Prof.  John  R.  Hayes 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 


Dr.  Melissa  Holland 
Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  ifeith  Holyoak 
University  of  Michigan 
Human  Performance  Center 
330  Packard  Road 
Ann  Arbor,  MI  *18109 

Dr .  Huynh  Huynh 
College  of  Education 
Univ.  of  South  Carolina 
Columbia,  SC  29203 

Dr.  Dillon  Inouye 
WICAT  Education  Institute 
Provo,  UT  8M057 

Dr.  Alice  Isen 
Department  of  Psychology 
University  of  Maryland 
Catonsville,  MD  21228 

Dr.  Claude  Janvier 

Directeur,  CIRADE 

Universite'  du  Quebec  a  Montreal 

Montreal ,  Quebec  H?C  3P8 

CANADA 

Dr.  Robin  Jeffries 
Com'^oter  Research  Center 
Hewlett-Packard  Laboratories 
1501  Page  Mill  Road 
Palo  Alto,  CA  9'1304 

Margaret  Jerome 

c/o  Dr.  Peter  Chandler 

83,  The  Drive 

Hove 

Sussex 

UNITED  KINGDOM 

Dr.  Daniel  Kahneman 
The  University  of  British  Columbia 
Department  of  Psychology 
«15«-2053  Main  Mall 
Vancouver,  British  Columbia 
CANADA  V6T  1Y7 


I 


1986/02/06 


Distribution  List  [U.  Wlsconsln/Glenberg  &  Epstein]  NR  702-012 


Dr.  Milton  S.  Katz 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria.  VA  2237? 

Dr.  Wendy  Kellogg 

IBM  T.  J.  Watson  Research  Ctr. 

P.O.  Box  218 

Yorktown  Heights,  NY  10598 

Dr.  Dennis  Kibler 
University  of  California 
Department  of  Information 
and  Computer  Science 
Irvine,  CA  92717 

Dr.  David  Kieras 
University  of  Michigan 
Technical  Communication 
College  of  Engineering 
122’  E.  Engineering  Building 
Ann  Arbor,  MI  <18109 

Dr.  Peter  Kincaid 
Training  Analysis 
&  Evaluation  Group 
Department  of  the  Navy 
Orlando,  FL  32813 

Dr.  Walter  Kintsch 
Department  of  Psychology 
University  of  Colorado 
Campus  Box  3*15 
Boulder,  CO  80302 

Dr.  David  Klahr 
Carnegie-Mellon  University 
Department  of  Psychology 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Janet  L.  Kolodner 
Georgia  Institute  of  Technology 
School  of  Information 
A  Computer  Science 
Atlanta.  GA  ’073? 

Dr.  Kenneth  Kotov  sky 
Department  of  Psychology 
Community  College  of 
Allegheny  County 
800  Allegheny  Avenue 
Pittsburgh,  PA  15233 


Dr.  David  R.  Lambert 
Naval  Ocean  Systems  Center 
Code  441T 

271  Catalina  Boulevard 
San  Diego,  CA  92152 

Dr.  Jean  Lave 
School  of  Social  Sciences 
University  of  California 
Irvine,  CA  92717 

Dr.  Robert  Lawler 
Information  Sciences,  FRL 
GTE  Laboratories,  Inc. 

40  Sylvan  Road 
Waltham,  MA  02254 

Dr.  Alan  M.  Lesgold 
Learning  R&D  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr.  Clayton  Lewis 
University  of  Colorado 
Department  of  Computer  Science 
Campus  Box  430 
Boulder,  CO  80309 

Dr.  Marcia  C.  Linn 
Lawrence  Hall  of  Science 
University  of  California 
Berkeley,  CA  94720 

Dr.  Sandra  P.  Marshall 
Dept,  of  Psychology 
San  Diego  State  University 
San  Diego,  CA  92182 

Dr.  Richard  E.  Mayer 
Department  of  Psychology 
University  of  California 
Santa  Barbara,  CA  93106 

Dr.  Kathleen  McKeown 
Columbia  University 
Department  of  Computer  Science 
New  York,  NY  10027 

Dr.  oce  McLachlan 

Navv  Personnel  R&D  Center 

San  Diego,  CA  92152 


1986/02/06 


Distribution  List  [IJ.  Wisconsin/Glenberg  &  Epstein]  MR  702-012 


Dr.  James  KcMichael 
Assistant  for  MPT  Research, 
Development,  and  Studies 
NAVOP  01 B7 

Washington,  DC  20770 

Dr.  Jose  Mestre 
University  of  Massachusetts 
^01  Goodell  Building 
Amherst,  MA 

Dr.  George  A.  Miller 
Department  of  Psychology 
Green  Hall 

Princeton  University 
Princeton,  NJ  085^10 

Dr.  William  Montague 

NPRDC  Code  17 

San  Diego,  CA  921*52 

Dr .  Tom  Moran 
Xerox  PARC 

Coyote  Hill  Road 
Palo  Alto,  CA 

Dr.  Allen  Munro 
Behavioral  Technology 
Laboratories  -  U3C 
IPHO  S.  Elena  Ave.,  tith  Floor 
Redondo  Beach,  CA  9C277 

Assistant  for  MPT  Research, 
Development  and  Studies 
MAvnp  niB7 

Washington,  DC  20'1'^n 

Dr.  Richard  F.  Nisbett 
University  of  Michigan 
Institute  for  Social  Research 
Room 

Ann  Arbor,  MI  'I^IOO 

Dirf'ctor,  Training  Laboratory, 
MPRDC  (Code  05) 

San  Diego,  CA  92152 

Director,  Manpower ^and  Personnel 
Laboratory, 

NPRDC  (Code  06) 

San  Diego,  CA  92152 


Director,  Human  Factors 

&  Organizational  Systems  Lab, 
NPRDC  (Code  07) 

San  Diego,  CA  9215? 

Fleet  Support  Office, 

NPRDC  (Code  ROD 
San  Diego,  CA  92152 

Libr;.ry,  NPRDC 

Code  P201L 

San  Diego,  CA  92152 

Dr.  Harry  F.  O'Neil,  Jr. 
University  of  Southern  California 
School  of  Education  —  WPH  B01 
Dept,  of  Educational 

Psychology  and  Technology 
Los  Angeles,  CA  90039-0031 

Dr.  Stellan  Ohlsson 
Learning  R  &  D  Center 
University  of  Pittsburgh 
3939  O'Hara  Street 
Pittsburgh,  PA  1521? 

Office  of  Naval  Research, 

Code  11112 
800  N.  Quincy  St. 

Arlington,  VA  22217-5000 

Office  of  Naval  Research, 

Code  11U2EP 
Ron  N.  v'ulncy  Street 
Arlin'»trn,  VA  22217-5000 

Office  of  Naval  Research, 

Code  11't2PT 
800  N.  Quincy  Street 
Arlington,  VA  22217-5000 
(6  Copies) 

Office  of  Naval  Research, 

Code  125 

300  N.  Quincy  Street 
Arlington,  VA  22217-5000 

Psychologist 

Office  of  Naval  Research 
Branch  Office,  London 
Box  39 

FPO  New  York,  NY  09510 


19fifi/02/06 


Distribution  List  [U.  Wisconsin/Glenberg  t  Epstein]  NR  702-012 


SpeoiPl  Assistant  for  Marine 
Corps  Matters, 

ONR  Code  OOMC 
800  N.  Quincy  St. 

Arlington,  VA  22217-5000 

Psychologist 

Office  of  Naval  Research 
Liaison  Office,  Far  East 
APO  San  Francisco,  CA  96503 

Dr.  Judith  Orasanu 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  223?? 

Dr.  Jesse  Orlansky 
Institute  for  Defense  Analyses 
1801  N.  Beauregard  St. 
Alexandria,  VA  22?11 

Dr .  Roy  Pea 

Bank  Street  College  of 
Education 

610  W.  112th  Street 
New  York.  NY  10025 

Dr.  Ray  Perez 
ART  (PERI-TT) 

5^*01  Eisenhower  Avenue 
Alexandria,  VA  223? 

Dr.  David  N.  Perkins 
Educational  Technology  Center 
33"^  Gutman  Library 
Appian  Way 
Cambridge,  MA  02nP 

Dr.  Nancy  Perry 
Chief  of  Naval  Education 
and  Training,  Code  OOA2A 
Naval  Station  Pensacola 
Pensacola,  FL  ?2508 

Dr.  Tjeerd  Plomp 

Twente  University  of  Technology 

Department  of  Education 

P.O.  Box  217 

7500  AE  ENSCHEDE 

THE  NETHERLANDS 


Dr.  Martha  Poison 
Department  of  Psychology 
Campus  Box  3^6 
University  of  Colorado 
Boulder,  CO  80309 

Dr.  Peter  Poison 
University  of  Colorado 
Department  of  Psychology 
Boulder,  CO  80309 

Dr.  Steven  t.  Poltrock 
MCC 

9H?0  Research  Blvd. 

Echelon  Bldg  #1 
Austin,  TX  78759-6509 

Dr.  Sukai  Prom- Jackson 
1421  Massachusetts  Ave.,  NW 
^602 

Washington,  DC  20005 

Dr.  Joseth  Psotka 
ATTN:  ’>'•81-10 
Army  Research  Institute 
5001  Eisenhower  Ave. 

Alexandria,  VA  22333 

Dr,  Lynne  Reder 
Department  of  Psychology 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr,  Mary  S.  Riley 
Program  in  Cognitive  Science 
Center  for  Human  Information 
Procesf Ing 

University  of  California 
La  Jolla,  CA  92093 

Dr.  Andrew  M.  Rose 
American  Institutes 
for  Research 

1055  Thomas  Jefferson  .St.,  NW 
Washington,  DC  20007 

Dr.  William  B.  Rouse 
Georgia  Institute  of  Technology 
School  oi  Industrial  &  Systems 
Engineering 
Atlanta,  GA  30332 


1986/02/06 


Distribution  List  [U.  Wisconsin/Glenberg  &  Epstein]  NR  702-012 


Dr .  Roger  Schank 

Yale  University 

Computer  Science  Department 

P.O.  Box  21S8 

New  Haven,  CT  06S?n 

Dr.  Janet  Schofield 
Learning  R&D  Center 
University  of  Pittsburgh 
Pittsburgh,  PA  15260 

Dr.  Marc  Sebrechts 
Department  of  Psychology 
Wesleyan  University 
Middletown,  CT  06«»75 

Dr.  Judith  Segal 

Room  ® 1 9F 

NTE 

1200  19th  Street  N.W. 
Washington,  DC  20202 

Dr.  Sjflvia  A.  S.  Shafto 
National  Institute  of  Education 
1200  19th  Street 
Mail  Stop  1806 
Washington,  DC  2020B 

Dr .  Lee  Shulman 
Stanford  University 
ie)in  Cathcart  Way 
Stanford,  CA  oinos 


Dr.  Edward  E.  Smith 
Bolt  Beranek  &  Newman,  Inc. 
50  Moulton  Street 
Cambridge,  MA  02138 

Dr.  Richard  E.  Snow 
Department  of  Psychology 
Stanford  University 
Stanford,  CA  9'13C6 

Dr.  Richard  Sorensen 
Navy  Personnel  R&D  Center 
San  Diego.  CA  9? 152 

Dr.  Kathryn  T.  Spoehr 
Brown  University 
Department  of  Psychology 
Providence,  RI  02912 

Dr.  Robert  Sternberg 
Department  of  Psychology 
Yale  University 
Box  11A,  Yale  Station 
New  Haven,  CT  06520 

Dr.  Thomas  Sticht 

Navy  Personnel  R&D  Center 

San  Diego,  CA  92152 

Dr.  John  Tangney 
AFOSR/NL 

Bolling  AFB,  DC  20^72 


r 


.  V', 


Dr.  Randall  Sihumaker 
Naval  Research  Laboratory 
Code  -7510 

Overlook  Avenue,  S.W. 
Washington,  DC  PO-’^S-snoo 

Dr,  Zita  M  Simutis 
Instructional  Technology 
Systems  Area 
ART 

5001  Eisenhower  Avenue 
Alexandria,  VA  ?23'’j 

Dr.  tl.  Wallace  Sinaiko 
Manpower  Research 

and  Advisory  Services 
Smithsonian  Institution 
8'’1  North  Pitt  Street 
Alexandria.  VA  2271/t 


.*j  V 


Dr.  Amos  Tversky 
Stanford  University 
Dept,  of  Psychology 
Stanford,  CA  9'»305 

Dr.  James  Tweeddale 
Technical  Director 
Navy  Personnel  R&D  Center 
San  Diego,  CA  92152 

Dr,  Paul  Twohig 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 


Headquarters,  U.  S.  Marine  Corps 
Code  MPI-20 
Washington,  DC  2O3P0 


Distribution  List  [U.  Wisconsln/Glenberg  &  Epstein]  NR  702-012 


Dr.  Wallace  Wulfeck,  III 
Navy  Personnel  R&D  Center 
San  Diego.  CA  0215? 

Dr.  Joe  Yasatuke 
AFHRL/LRT 

Lowry  AFB,  CO  80230 

Dr.  Hasoud  Yazdani 
Dept,  of  Computer  Science 
University  of  Exeter 
Exeter  EX4  l|QL 
Devon,  ENGLAND 

Dr.  Joseph  L.  Young 
Memory  &  Cognitive 
Processes 

National  Science  Foundation 
Washington.  DC  20550 


Dr.  Barbara  White 

Bolt  Beranek  4  Neuman,  Inc, 

10  Moulton  Street 
Cambridge,  MA  0223" 

LCDR  Cory  deGroot  Whitehead 
Chief  of  Naval  Operations 
OP-1 12C1 

Washington,  DC  20370-2000 

Dr.  Mike  Williams 
TntelliGenetics 
12^)  University  Avenue 
Palo  Alto,  CA  9*1^01 

Dr.  Robert  A.  Wisher 
U.S.  Army  Institute  for  the 

Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue 
Alexandria.  VA  223?"? 

Dr.  Martin  F.  Wiskoff 
Navy  Personnel  RAD  Center 
San  Diego,  CA  92152 


Dr.  Kurt  Van  Lehn 
Department  of  Psychology 
Carnegie-Mellon  University 
Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Howard  Wainer 
Division  of  Psychological  Studi'ts 
Educational  Testing  Service 
Princeton,  NJ  085*11 

Dr.  Beth  Warren 

Bolt  Beranek  &  Newman,  Inc, 

50  Moulton  Street 
Cambridge,  MA  0213" 

Dr.  Keith  T.  Wescourt 
FMC  Corporation 
Central  Engineering  Labs 
ll^'i  Coleman  Ave.,  Box  5"0 
Santa  Clara,  CA  95052 


Mr.  John  H.  Wolfe 

Navy  Personnel  RAD  Center 

San  Diego,  CA  92152 


usuitai  «TJkrr 


uunuK  jua  stmormr  Am 

I.  Btadftv4  •roMt 
AMlAtaac  rr«f«MeT 
UueatloMl  hyckologjr 

Aom  M.  I>i>— 1 1  tn 
te»oel«c«  FtrafMsor 
StwilM  la  Mkavlatal 
tnaablUtlaa 


cu:<sfto(M  raocum  Am 

TIuMaa  P.  Caryaatar 
rrolaaaor 

Corrlcttlua  aaA  laatructioa 


tcaooi.  riDCttm  Am 

Ullltaa  a.  aaaa«l 

Frafaaaac 

Law 

Gary  D.  GaAdy* 

AaalaCaat  Profaaaor 
Jouraallaa  aaA  Maaa 
Cnwinlcatlea 

Adaa  Oaaoraa* 

AaalaCaat  frafaaaor 
Sociology 

Carl  A.  Grant 
Pcofaaaor 

CurrlculuB  and  Inatruccloo 

Harbart  J.  Klauaaalar 
Founding  HCEK  Dlractor  and 
V.A.C.  Hoimoq  FrofaaaoT 
Educational  Faychology 


SOCIAL  FOLia  At£A 

MllllaB  H.  Ci ma*# 

Profaaaor 

Law 

W.  Lac  Hanson* 

Prof aaaoT 
Bconoalcs 

Carl  F.  Kaastla 
Professor 

Educational  Policy  Studios 
and  History 

Josoph  F.  Ksuffaan* 
Profassor 

Educational  Adalolstratlon 


HllllM  tpataltt 

Frafaaaer 

Fayetkology 

Arthur  H.  Olaabarg 
Aaaoclata  Profaasor 
Faychology 


BUsabacb  H.  Fanaaai 
Frofasaor 

CurrleuliBB  and  lascraccloa 


Hut  >•  Mnta* 

Aaaoclata  Fcofaaaor 
Uaeacloaal  Policy  Stadias 

Prad  M.  Hswannn 
Sacoadary  Cantor  Dlractor  and 
Frofaaaar 

OurrleuliB  and  Instruction 

F.  Martin  Ryatraod* 

Aaaoclaca  FroPaaaor 
Cagl^lah 

Janlca  H.  Pactarson*# 
Aaslatant  Sclaatlat 

Allan  J.  Fltana 
Lacturar 

School  of  Education 
Daahln  Dnlvoralty 


Cora  B.  Marratt* 

ProfoaaoT 

Sociology  and  Afro-AsMrlcan 
Studios 

Klchaol  B.  Olnack 
Assoclato  Profassor 
Educational  Policy  Studios  sad 
Sociology 

Thonas  A.  Roabsrg 
Profassor 

Currlculia  and  Inacructlon 


tfUllsa  M.  laynolda* 
Profassor 

Bdueaclonal  Psychology 

Lauraaca  Stalnharg* 
Profassor 

Child  and  Paally  Studlas 


Paoalopo  L.  Pataraon 
Profassor 

Educational  Psychology 


Stawart  C.  Purhay*# 
Aaalatant  Ptnfaaaor 
Lawranca  Dhlranlty 

Thnnaa  A.  gaatirg 
Profaasor 

CurrlculuB  aad  Inacruetloa 

Ilchard  A.  Eosaadllar* 
Profaasor 

tducatlonal  Adalalatratloa 

lohart  A.  fcittar* 

Aaalatant  Sclaatlat 

Gary  C.  Hahlaga* 

Profaasor 

Ourrlculua  sad  Instruction 

taaaatb  M.  taiebnar'^ 
Aaaoclata  Profassor 
Currlculua  and  loatructton 


Prancla  E.  Schrag* 

Profassor 

Educational  Policy  Studlas 

Marshall  S.  Saith*# 

UCEB  Dlractor  and  Profassor 
Educational  Policy  Studlaa 
and  Educational  Psychology 

Jacob  0.  Staapan* 

Aaalatant  Profaasor 
Educaclonol  Adainlatratlon 


*afflllatad  with  tha  Hatlonal  Cantor  on  Effactlva  Secondary  Schools,  Unlvaralty  of  Wlaconaln- 
Madlaon 

lafflllatad  with  tha  Cantor  for  Policy  Baaaarch  In  Education,  Eutgart  Dnl varsity 
*afflllacsd  with  tha  Caacar  on  Poataacondary  Hanagaaant  and  Govarnanca,  (talvartity  of  Maryland 
'^afttllatnd  with  tha  Cantar  on  taachax  Education,  Michigan  Scats  Unlvartity 


